title: “World Universities Rankings Advanced Analysis” author: “Gabriel Preda” date: “Created: 2018-03-25; Last updated:2023-05-19” output: html_document: number_sections: false toc: true fig_width: 8 fig_height: 6 theme: cosmo highlight: tango code_folding: hide —
# Introduction
The University of Bologna, the oldest university in the World, founded in 1088 (Wikimedia Commons)
The World Universities Rankings is using data about Universities Rankings from multiple sources: Center for World University Rankings (CWUR), Shanghai University Rankings (AWUR), Times Higher Education University Rankings (THE). Each of these sources is using a different hierarchization system. Besides the World University Ranking systems, there are also information about the schools and countries, about the education expenditure (per country and year) and education attainment (per country and year, multiple dimmensions).
We include the R libraries used for data input, processing, analysis and visualization. We are using pacman for library management.
#use pacman for package management
if (!require("pacman")) install.packages("pacman")
pacman::p_load(knitr, kableExtra, formattable, dplyr, tm, tidyr, wordcloud,
ggplot2, treemap, gridExtra, grid, fmsb, leaflet, plotly,reshape, corrplot,ggrepel)
options(knitr.table.format = "html")
We read the files in the dataset.
#input of datasets from world-university-rankings
cwurData <- read.csv("data/school_data/cwurData.csv")
educationExpenditure <- read.csv("data/school_data/education_expenditure_supplementary_data.csv")
educationalAttainment <- read.csv("data/school_data/educational_attainment_supplementary_data.csv")
schoolCountry <- read.csv("data/school_data/school_and_country_table.csv")
shanghaiData <- read.csv("data/school_data/shanghaiData.csv")
timesData <- read.csv("data/school_data/timesData.csv")
We have 6 data files in University Rankings dataset, as following:
Let’s see the first few rows of each data file and also glimpse these data files.
##CWUR
#knitr::kable(head(cwurData,10),caption="Center World University Rankings information (first 10 rows)")
kable(head(cwurData,10), "html") %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
| world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | year |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Harvard University | USA | 1 | 7 | 9 | 1 | 1 | 1 | 1 | NA | 5 | 100.00 | 2012 |
| 2 | Massachusetts Institute of Technology | USA | 2 | 9 | 17 | 3 | 12 | 4 | 4 | NA | 1 | 91.67 | 2012 |
| 3 | Stanford University | USA | 3 | 17 | 11 | 5 | 4 | 2 | 2 | NA | 15 | 89.50 | 2012 |
| 4 | University of Cambridge | United Kingdom | 1 | 10 | 24 | 4 | 16 | 16 | 11 | NA | 50 | 86.17 | 2012 |
| 5 | California Institute of Technology | USA | 4 | 2 | 29 | 7 | 37 | 22 | 22 | NA | 18 | 85.21 | 2012 |
| 6 | Princeton University | USA | 5 | 8 | 14 | 2 | 53 | 33 | 26 | NA | 101 | 82.50 | 2012 |
| 7 | University of Oxford | United Kingdom | 2 | 13 | 28 | 9 | 15 | 13 | 19 | NA | 26 | 82.34 | 2012 |
| 8 | Yale University | USA | 6 | 14 | 31 | 12 | 14 | 6 | 15 | NA | 66 | 79.14 | 2012 |
| 9 | Columbia University | USA | 7 | 23 | 21 | 10 | 13 | 12 | 14 | NA | 5 | 78.86 | 2012 |
| 10 | University of California, Berkeley | USA | 8 | 16 | 52 | 6 | 6 | 5 | 3 | NA | 16 | 78.55 | 2012 |
glimpse(cwurData)
## Rows: 2,200
## Columns: 14
## $ world_rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ institution <chr> "Harvard University", "Massachusetts Institute of…
## $ country <chr> "USA", "USA", "USA", "United Kingdom", "USA", "US…
## $ national_rank <int> 1, 2, 3, 1, 4, 5, 2, 6, 7, 8, 9, 10, 11, 1, 12, 1…
## $ quality_of_education <int> 7, 9, 17, 10, 2, 8, 13, 14, 23, 16, 15, 21, 31, 3…
## $ alumni_employment <int> 9, 17, 11, 24, 29, 14, 28, 31, 21, 52, 26, 42, 16…
## $ quality_of_faculty <int> 1, 3, 5, 4, 7, 2, 9, 12, 10, 6, 8, 14, 24, 31, 20…
## $ publications <int> 1, 12, 4, 16, 37, 53, 15, 14, 13, 6, 34, 22, 9, 8…
## $ influence <int> 1, 4, 2, 16, 22, 33, 13, 6, 12, 5, 20, 21, 10, 19…
## $ citations <int> 1, 4, 2, 11, 22, 26, 19, 15, 14, 3, 28, 16, 8, 23…
## $ broad_impact <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ patents <int> 5, 1, 15, 50, 18, 101, 26, 66, 5, 16, 101, 10, 9,…
## $ score <dbl> 100.00, 91.67, 89.50, 86.17, 85.21, 82.50, 82.34,…
## $ year <int> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2…
##Shanghai
knitr::kable(head(shanghaiData,10),caption="Shanghai Ranking information (first 10 rows)")
| world_rank | university_name | national_rank | total_score | alumni | award | hici | ns | pub | pcp | year |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Harvard University | 1 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 72.4 | 2005 |
| 2 | University of Cambridge | 1 | 73.6 | 99.8 | 93.4 | 53.3 | 56.6 | 70.9 | 66.9 | 2005 |
| 3 | Stanford University | 2 | 73.4 | 41.1 | 72.2 | 88.5 | 70.9 | 72.3 | 65.0 | 2005 |
| 4 | University of California, Berkeley | 3 | 72.8 | 71.8 | 76.0 | 69.4 | 73.9 | 72.2 | 52.7 | 2005 |
| 5 | Massachusetts Institute of Technology (MIT) | 4 | 70.1 | 74.0 | 80.6 | 66.7 | 65.8 | 64.3 | 53.0 | 2005 |
| 6 | California Institute of Technology | 5 | 67.1 | 59.2 | 68.6 | 59.8 | 65.8 | 52.5 | 100.0 | 2005 |
| 7 | Columbia University | 6 | 62.3 | 79.4 | 60.6 | 56.1 | 54.2 | 69.5 | 45.4 | 2005 |
| 8 | Princeton University | 7 | 60.9 | 63.4 | 76.8 | 60.9 | 48.7 | 48.5 | 59.1 | 2005 |
| 9 | University of Chicago | 8 | 60.1 | 75.6 | 81.9 | 50.3 | 44.7 | 56.4 | 42.2 | 2005 |
| 10 | University of Oxford | 2 | 59.7 | 64.3 | 59.1 | 48.4 | 55.6 | 68.4 | 53.2 | 2005 |
glimpse(shanghaiData)
## Rows: 4,897
## Columns: 11
## $ world_rank <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11…
## $ university_name <chr> "Harvard University", "University of Cambridge", "Stan…
## $ national_rank <chr> "1", "1", "2", "3", "4", "5", "6", "7", "8", "2", "9",…
## $ total_score <dbl> 100.0, 73.6, 73.4, 72.8, 70.1, 67.1, 62.3, 60.9, 60.1,…
## $ alumni <dbl> 100.0, 99.8, 41.1, 71.8, 74.0, 59.2, 79.4, 63.4, 75.6,…
## $ award <dbl> 100.0, 93.4, 72.2, 76.0, 80.6, 68.6, 60.6, 76.8, 81.9,…
## $ hici <dbl> 100.0, 53.3, 88.5, 69.4, 66.7, 59.8, 56.1, 60.9, 50.3,…
## $ ns <dbl> 100.0, 56.6, 70.9, 73.9, 65.8, 65.8, 54.2, 48.7, 44.7,…
## $ pub <dbl> 100.0, 70.9, 72.3, 72.2, 64.3, 52.5, 69.5, 48.5, 56.4,…
## $ pcp <dbl> 72.4, 66.9, 65.0, 52.7, 53.0, 100.0, 45.4, 59.1, 42.2,…
## $ year <int> 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, …
##Times
knitr::kable(head(timesData,10),caption="Times Higher Education World University Rankings data information (first 10 rows)")
| world_rank | university_name | country | teaching | international | research | citations | income | total_score | num_students | student_staff_ratio | international_students | female_male_ratio | year |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Harvard University | United States of America | 99.7 | 72.4 | 98.7 | 98.8 | 34.5 | 96.1 | 20,152 | 8.9 | 25% | 2011 | |
| 2 | California Institute of Technology | United States of America | 97.7 | 54.6 | 98.0 | 99.9 | 83.7 | 96.0 | 2,243 | 6.9 | 27% | 33 : 67 | 2011 |
| 3 | Massachusetts Institute of Technology | United States of America | 97.8 | 82.3 | 91.4 | 99.9 | 87.5 | 95.6 | 11,074 | 9.0 | 33% | 37 : 63 | 2011 |
| 4 | Stanford University | United States of America | 98.3 | 29.5 | 98.1 | 99.2 | 64.3 | 94.3 | 15,596 | 7.8 | 22% | 42 : 58 | 2011 |
| 5 | Princeton University | United States of America | 90.9 | 70.3 | 95.4 | 99.9 |
|
94.2 | 7,929 | 8.4 | 27% | 45 : 55 | 2011 |
| 6 | University of Cambridge | United Kingdom | 90.5 | 77.7 | 94.1 | 94.0 | 57.0 | 91.2 | 18,812 | 11.8 | 34% | 46 : 54 | 2011 |
| 6 | University of Oxford | United Kingdom | 88.2 | 77.2 | 93.9 | 95.1 | 73.5 | 91.2 | 19,919 | 11.6 | 34% | 46 : 54 | 2011 |
| 8 | University of California, Berkeley | United States of America | 84.2 | 39.6 | 99.3 | 97.8 |
|
91.1 | 36,186 | 16.4 | 15% | 50 : 50 | 2011 |
| 9 | Imperial College London | United Kingdom | 89.2 | 90.0 | 94.5 | 88.3 | 92.9 | 90.6 | 15,060 | 11.7 | 51% | 37 : 63 | 2011 |
| 10 | Yale University | United States of America | 92.1 | 59.2 | 89.7 | 91.5 |
|
89.5 | 11,751 | 4.4 | 20% | 50 : 50 | 2011 |
glimpse(timesData)
## Rows: 2,603
## Columns: 14
## $ world_rank <chr> "1", "2", "3", "4", "5", "6", "6", "8", "9", "1…
## $ university_name <chr> "Harvard University", "California Institute of …
## $ country <chr> "United States of America", "United States of A…
## $ teaching <dbl> 99.7, 97.7, 97.8, 98.3, 90.9, 90.5, 88.2, 84.2,…
## $ international <chr> "72.4", "54.6", "82.3", "29.5", "70.3", "77.7",…
## $ research <dbl> 98.7, 98.0, 91.4, 98.1, 95.4, 94.1, 93.9, 99.3,…
## $ citations <dbl> 98.8, 99.9, 99.9, 99.2, 99.9, 94.0, 95.1, 97.8,…
## $ income <chr> "34.5", "83.7", "87.5", "64.3", "-", "57.0", "7…
## $ total_score <chr> "96.1", "96.0", "95.6", "94.3", "94.2", "91.2",…
## $ num_students <chr> "20,152", "2,243", "11,074", "15,596", "7,929",…
## $ student_staff_ratio <dbl> 8.9, 6.9, 9.0, 7.8, 8.4, 11.8, 11.6, 16.4, 11.7…
## $ international_students <chr> "25%", "27%", "33%", "22%", "27%", "34%", "34%"…
## $ female_male_ratio <chr> "", "33 : 67", "37 : 63", "42 : 58", "45 : 55",…
## $ year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
##Education expenditure
knitr::kable(head(educationExpenditure,10),caption="Education expenditure information (first 10 rows)")
| country | institute_type | direct_expenditure_type | X1995 | X2000 | X2005 | X2009 | X2010 | X2011 |
|---|---|---|---|---|---|---|---|---|
| OECD Average | All Institutions | Public | 4.9 | 4.9 | 5.0 | 5.4 | 5.4 | 5.3 |
| Australia | All Institutions | Public | 4.5 | 4.6 | 4.3 | 4.5 | 4.6 | 4.3 |
| Austria | All Institutions | Public | 5.3 | 5.4 | 5.2 | 5.7 | 5.6 | 5.5 |
| Belgium | All Institutions | Public | 5.0 | 5.1 | 5.8 | 6.4 | 6.4 | 6.4 |
| Canada | All Institutions | Public | 5.8 | 5.2 | 4.8 | 5.0 | 5.2 | NA |
| Chile | All Institutions | Public | NA | 4.2 | 3.3 | 4.1 | 4.3 | 3.9 |
| Czech Republic | All Institutions | Public | 4.8 | 4.2 | 4.1 | 4.2 | 4.1 | 4.4 |
| Denmark | All Institutions | Public | 6.5 | 6.4 | 6.8 | 7.5 | 7.6 | 7.5 |
| Estonia | All Institutions | Public | NA | NA | 4.7 | 5.9 | 5.6 | 5.2 |
| Finland | All Institutions | Public | 6.6 | 5.5 | 5.9 | 6.3 | 6.4 | 6.3 |
glimpse(educationExpenditure)
## Rows: 333
## Columns: 9
## $ country <chr> "OECD Average", "Australia", "Austria", "Belgi…
## $ institute_type <chr> "All Institutions ", "All Institutions ", "All…
## $ direct_expenditure_type <chr> "Public", "Public", "Public", "Public", "Publi…
## $ X1995 <dbl> 4.9, 4.5, 5.3, 5.0, 5.8, NA, 4.8, 6.5, NA, 6.6…
## $ X2000 <dbl> 4.9, 4.6, 5.4, 5.1, 5.2, 4.2, 4.2, 6.4, NA, 5.…
## $ X2005 <dbl> 5.0, 4.3, 5.2, 5.8, 4.8, 3.3, 4.1, 6.8, 4.7, 5…
## $ X2009 <dbl> 5.4, 4.5, 5.7, 6.4, 5.0, 4.1, 4.2, 7.5, 5.9, 6…
## $ X2010 <dbl> 5.4, 4.6, 5.6, 6.4, 5.2, 4.3, 4.1, 7.6, 5.6, 6…
## $ X2011 <dbl> 5.3, 4.3, 5.5, 6.4, NA, 3.9, 4.4, 7.5, 5.2, 6.…
##Education attainment
knitr::kable(head(educationalAttainment,10),caption="Education attainment information (first 10 rows)")
| country_name | series_name | X1985 | X1986 | X1987 | X1990 | X1991 | X1992 | X1993 | X1995 | X1996 | X1997 | X1998 | X1999 | X2000 | X2001 | X2002 | X2003 | X2004 | X2005 | X2006 | X2007 | X2008 | X2009 | X2010 | X2011 | X2012 | X2013 | X2015 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | Barro-Lee: Average years of primary schooling, age 15+, female | 0.33 | NA | NA | 0.44 | NA | NA | NA | 0.57 | NA | NA | NA | NA | 0.75 | NA | NA | NA | NA | 0.86 | NA | NA | NA | NA | 1.27 | NA | NA | NA | NA |
| Afghanistan | Barro-Lee: Average years of primary schooling, age 15+, total | 1.03 | NA | NA | 1.26 | NA | NA | NA | 1.54 | NA | NA | NA | NA | 2.01 | NA | NA | NA | NA | 2.18 | NA | NA | NA | NA | 2.64 | NA | NA | NA | NA |
| Afghanistan | Barro-Lee: Average years of primary schooling, age 15-19, female | 0.83 | NA | NA | 0.95 | NA | NA | NA | 1.26 | NA | NA | NA | NA | 1.92 | NA | NA | NA | NA | 1.01 | NA | NA | NA | NA | 2.45 | NA | NA | NA | NA |
| Afghanistan | Barro-Lee: Average years of primary schooling, age 15-19, total | 2.34 | NA | NA | 2.22 | NA | NA | NA | 2.37 | NA | NA | NA | NA | 3.83 | NA | NA | NA | NA | 2.26 | NA | NA | NA | NA | 3.55 | NA | NA | NA | NA |
| Afghanistan | Barro-Lee: Average years of primary schooling, age 20-24, female | 0.54 | NA | NA | 0.92 | NA | NA | NA | 0.94 | NA | NA | NA | NA | 1.26 | NA | NA | NA | NA | 2.00 | NA | NA | NA | NA | 1.29 | NA | NA | NA | NA |
| Afghanistan | Barro-Lee: Average years of primary schooling, age 20-24, total | 1.52 | NA | NA | 2.51 | NA | NA | NA | 2.27 | NA | NA | NA | NA | 2.48 | NA | NA | NA | NA | 3.93 | NA | NA | NA | NA | 2.64 | NA | NA | NA | NA |
| Afghanistan | Barro-Lee: Average years of primary schooling, age 25+, female | 0.17 | NA | NA | 0.25 | NA | NA | NA | 0.37 | NA | NA | NA | NA | 0.48 | NA | NA | NA | NA | 0.63 | NA | NA | NA | NA | 0.81 | NA | NA | NA | NA |
| Afghanistan | Barro-Lee: Average years of primary schooling, age 25+, total | 0.66 | NA | NA | 0.85 | NA | NA | NA | 1.14 | NA | NA | NA | NA | 1.38 | NA | NA | NA | NA | 1.69 | NA | NA | NA | NA | 2.19 | NA | NA | NA | NA |
| Afghanistan | Barro-Lee: Average years of primary schooling, age 25-29, female | 0.44 | NA | NA | 0.54 | NA | NA | NA | 0.92 | NA | NA | NA | NA | 0.94 | NA | NA | NA | NA | 1.26 | NA | NA | NA | NA | 1.92 | NA | NA | NA | NA |
| Afghanistan | Barro-Lee: Average years of primary schooling, age 25-29, total | 1.28 | NA | NA | 1.52 | NA | NA | NA | 2.51 | NA | NA | NA | NA | 2.27 | NA | NA | NA | NA | 2.48 | NA | NA | NA | NA | 3.93 | NA | NA | NA | NA |
glimpse(educationalAttainment)
## Rows: 79,055
## Columns: 29
## $ country_name <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ series_name <chr> "Barro-Lee: Average years of primary schooling, age 15+, …
## $ X1985 <dbl> 0.33, 1.03, 0.83, 2.34, 0.54, 1.52, 0.17, 0.66, 0.44, 1.2…
## $ X1986 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X1987 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X1990 <dbl> 0.44, 1.26, 0.95, 2.22, 0.92, 2.51, 0.25, 0.85, 0.54, 1.5…
## $ X1991 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X1992 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X1993 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X1995 <dbl> 0.57, 1.54, 1.26, 2.37, 0.94, 2.27, 0.37, 1.14, 0.92, 2.5…
## $ X1996 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X1997 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X1998 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X1999 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2000 <dbl> 0.75, 2.01, 1.92, 3.83, 1.26, 2.48, 0.48, 1.38, 0.94, 2.2…
## $ X2001 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2002 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2003 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2004 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2005 <dbl> 0.86, 2.18, 1.01, 2.26, 2.00, 3.93, 0.63, 1.69, 1.26, 2.4…
## $ X2006 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2007 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2008 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2009 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2010 <dbl> 1.27, 2.64, 2.45, 3.55, 1.29, 2.64, 0.81, 2.19, 1.92, 3.9…
## $ X2011 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2012 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2013 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X2015 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
##School & country
knitr::kable(head(schoolCountry,10),caption="School & country information (first 10 rows)")
| school_name | country |
|---|---|
| Harvard University | United States of America |
| California Institute of Technology | United States of America |
| Massachusetts Institute of Technology | United States of America |
| Stanford University | United States of America |
| Princeton University | United States of America |
| University of Cambridge | United Kingdom |
| University of Oxford | United Kingdom |
| University of California, Berkeley | United States of America |
| Imperial College London | United Kingdom |
| Yale University | United States of America |
glimpse(schoolCountry)
## Rows: 818
## Columns: 2
## $ school_name <chr> "Harvard University", "California Institute of Technology"…
## $ country <chr> "United States of America", "United States of America", "U…
The Center for World University Rankings provides information for top
universities in the World. They use an overal rank per university and
individual values for the National Rank, Quality of Education, Alumni
Employment, Quality of Faculty, Publications, Influence, Citations,
Broad Impact, Patents, and an Total Score. All these values are for a
certain year. For example, Harvard University has the first
position for 2012, with 100.00 overall
score and ranks 1 for most of the dimmensions.
##Best Universities
Let’s see the top 5 of Universities for the years 2012-2015, according to CWUR.
cwurData %>% group_by(year) %>%
select(year,institution,world_rank) %>% top_n(-5, wt = world_rank) -> cwurTop5
plot_ly(cwurTop5, x = ~year) %>%
add_trace(y = cwurTop5$world_rank, name = cwurTop5$institution, showlegend=TRUE, type = 'scatter', mode = 'lines+markers', color= cwurTop5$institution) %>%
layout(title="World Ranked Universities by CWUR (2012-2015)",
xaxis = list(showticklabels = TRUE, tickangle = 0, tickfont = list(size = 8)),
yaxis = list(title = "World rank"),
hovermode = 'compare')
Harvard is on the first place on all the years, followed by MIT on 2012 and Stanford in 2013-2015; the third place is changing between Stanford (2012), Caltech (2013), with MIT holding it for 2014 and 2015.
Harvard Yard (Wikimedia Commons)
Let’s show now top 10, based on World rank, for each year. We show the 3 first places as gold, silver and bronze, with green the rest of the places. We create first a function to show the top.
cwurPlotYear <- function(nYear) {
cwurData %>% filter(year==nYear) %>% top_n(10,-world_rank) %>%
ggplot(aes(x=reorder(institution,-world_rank), y=world_rank)) + geom_bar(stat="identity", aes(fill=reorder(institution,-world_rank)), colour="black") +
theme_bw() + coord_flip() + scale_fill_manual(values=c(rep("lightgreen",7), "#CD7F32", "grey", "gold")) + guides(fill=FALSE) +
labs(x="Institution", y="World Rank",
title=paste("Rank in ",nYear), subtitle="(smaller value is better)")
}
cwurPlotYear(2012) -> d1
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
cwurPlotYear(2013) -> d2
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
cwurPlotYear(2014) -> d3
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
cwurPlotYear(2015) -> d4
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
grid.arrange(d1,d2,d3,d4, ncol=2)
Let’s select top 10 countries - top calculated based on the number of universities (on all years). For this top 10 we will look to the various metrics: rank based on number of publications, number of citations, number of patents. All these factors are relevant for the quality of research done in the respective university. The better the ranks related to research, the higher the quality of the research performed in the universities in that country. We also look to quality of education, alumni employment and quality of faculty ranks. Totally, there are 6 different dimmensions we analyze here.
cwurData %>% group_by(country) %>% summarise(n = length(publications)) %>% top_n(10,n) %>% ungroup() -> c
cwurData %>% filter(country %in% c$country) %>%
ggplot(aes(x=country, y=publications, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Rank by publication",
title="Rank by publication", subtitle="Grouped by country, smaller value is better") -> d1
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
cwurData %>% filter(country %in% c$country) %>%
ggplot(aes(x=country, y=citations, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Rank by citations",
title="Rank by citations", subtitle="Grouped by country, smaller value is better") -> d2
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
cwurData %>% filter(country %in% c$country) %>%
ggplot(aes(x=country, y=patents, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Rank by patents",
title="Rank by patents", subtitle="Grouped by country, smaller value is better") -> d3
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
cwurData %>% filter(country %in% c$country) %>%
ggplot(aes(x=country, y=quality_of_education, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Rank by quality of education",
title="Rank by quality of education", subtitle="Grouped by country, smaller value is better") -> d4
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
cwurData %>% filter(country %in% c$country) %>%
ggplot(aes(x=country, y=alumni_employment, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Rank by alumni employment",
title="Rank by alumni employment", subtitle="Grouped by country, smaller value is better") -> d5
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
cwurData %>% filter(country %in% c$country) %>%
ggplot(aes(x=country, y=quality_of_faculty, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Rank by quality of faculty",
title="Rank by quality of faculty", subtitle="Grouped by country, smaller value is better") -> d6
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
grid.arrange(d1,d2,d3,d4,d5,d6, ncol=2)
USA is leading (smaller values are the best) on all three metrics
relevant for quality of research (publications, citations, patents),
with the average value of the rank as well the smallest. It is followed
by UK on all the indicators, except patents, where South Korea holds an
edge. Canada holds a third place on publications and citations.
As per quality of education, alumni employment and quality of faculty,
we can notice that distributions shows as unbalanced, with average very
close to the third and fourth quartile, and with significant number of
outliers in the first quartile, especially for Spain, Italy, Japan
(quality of education), Spain, South Korea, Italy, France (alumni
employment), South Korea, Japan, Germany, France, Canada (quality of
faculty). This means that on these three indicators, there are few high
performing universities, with best ranks (smaller) whilst the majority
of the universities in these countries have averages over 200. An
exception are US and UK, where there is a large variation for quality of
education and alumni empoloyment and averages are much lower than for
the other top countries. We define a function to plot Country University
Rankings. We are using the number of universities in the CWUR top as the
main metric (will use a color range showing the number of Universities
in the top, from red the smallest number to blue the largest number). We
also show the average rank, the min (best) and max (worst) world
rank.
cwurData %>% group_by(country,year) %>%
summarise(nr = length(world_rank), minw=min(world_rank), maxw=max(world_rank), avgw=round(mean(world_rank),0)) %>%
select(country, year, nr, minw, maxw, avgw) %>% ungroup() -> ccwur
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
# light grey boundaries
l <- list(color = toRGB("grey"), width = 0.5)
ccwur$hover <- with(ccwur,
paste("Country: ", country, '<br>',
"Year: ",year, "<br>",
"Universities in top: ", nr, "<br>",
"Min rank in top: ", minw, "<br>",
"Max rank in top: ", maxw, "<br>",
"Mean rank in top: ", avgw,"<br>"
))
# specify map projection/options
g <- list(
showframe = TRUE,
showcoastlines = TRUE,
projection = list(type = 'orthogonal')
)
plot_geo(ccwur, locationmode = 'country names') %>%
add_trace(
z = ~nr, color = ~nr, colors = 'Spectral', frame = ~year,
text = ~hover, locations=~country, marker = list(line = l)
) %>%
colorbar(title = 'Number of\nuniversities in top', tickprefix = '') %>%
layout(
title = with(ccwur, paste('Number of universities in top<br>Source:<a href="http://cwur.org/">Council of World University Ranking</a>')),
geo = g
)
US is leading the countries top, followed by UK, Canada, France, Germany, Spain, China (last years).
The Shanghai World Rankings for universities have a World Rank (smaller value is the best), a National Rank (per country, smaller value is best) and scores from 0 to 100 for:
All these values are for a corresponding year between 2005 and
2015.
As these values are all ranking between 0 and 100, it will be relevant
to represent them as Spider-Web graphs.
The world rank is a bit misleading, since it is either a single number (like, 1, 2, 3) or an interval (100-400). This will be a bit complicated to use for hierarchy. We can use instead the Total Score (a number between 0 and 100).
First we need to solve an issue with a lot of rows with the Total
Score missing (NA). Let’s see first how many such rows are.
There are 3796 rows with Total Score NA. Fortunatelly, we can calculate
the total_score if we do have the other values. The formula is:
total_score = 0.1 * alumni + 0.2 * award + 0.2 * hici + 0.2 * ns + 0.2 * pub + 0.1 * pcp
Let’s calculate the Total Score from shanghaiData with
this formula for the entries misising the Total Score.
shanghaiDataCld = shanghaiData
shanghaiDataCld$t_score =
0.1 * shanghaiDataCld$alumni + 0.2 * shanghaiDataCld$award + 0.2 * shanghaiDataCld$hici +
0.2 * shanghaiDataCld$ns + 0.2 * shanghaiDataCld$pub + 0.1 * shanghaiDataCld$pcp
shanghaiDataCld$total_score[is.na(shanghaiDataCld$total_score)] = shanghaiDataCld$t_score[is.na(shanghaiDataCld$total_score)]
Let’s check if there is still data with invalid
total_score. The number of data rows with NA
value of total_score is: 22. Let’s eliminate this
additional data, since we cannot fix them.
shanghaiDataCld = shanghaiDataCld[complete.cases(shanghaiDataCld),]
Let’s check now if there are any more rows with
total_score as NA. The number of rows in
shanghaiData is 4897 and the number of rows with Total Score a valid
number is: 4875. The difference is representing by the rows that we
could not fix because we had also other features with NA values, besides
total_score.
The minimum Shanghai Total Score is now 7.99 and the maximum is still 100.
#Fix the duplicate name for University of California-Berkeley
shanghaiDataCld$university_name[shanghaiDataCld$university_name=="University of California-Berkeley"] <- "University of California, Berkeley"
shanghaiDataCld %>% group_by(year) %>%
top_n(10, wt = total_score) %>% select(year,university_name,total_score,alumni, award, hici, ns, pub, pcp) %>% ungroup() -> top10univ
#draw with plotly
plot_ly(top10univ, x = ~year) %>%
add_trace(y = top10univ$total_score, name = top10univ$university_name, showlegend=TRUE, type = 'scatter', mode = 'lines+markers', color= top10univ$university_name) %>%
layout(title="Shanghai (ARWU) World Ranks (2005-2015)<br>Best ranked universities based on total score", legend = list(orientation = 'h'),
xaxis = list(showticklabels = TRUE, tickangle = 0, tickfont = list(size = 8)),
yaxis = list(title = "Total score"),
hovermode = 'compare')
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
The same 10 universities (University of California at Berkley appears with 2 different entries) appears in top 10. Harvard University is leading with a flat 100 points as total score.
Let’s represent the top universities on the same graph. We will use a radarchart (Spider-Web) type of graph.
top10SpiderWebYear <- function(nYear) {
top10univ %>% filter(year==nYear) %>% ungroup() -> top10u
top10 <- as.data.frame(cbind(top10u[,c(3,4,5,6,7,8,9)]))
colnames(top10) <- c("Total Score", "Alumni with Nobel", "Awarded Nobel", "Highly Cited",
"Nature&Science", "Publications", "PCAP")
rownames(top10) <- top10u$university_name
rmin <- apply(top10,2,min); rmax <- apply(top10,2,max)
rmax <- 100
rmin <- 0
colors_border=c( "tomato", "blue", "gold", "green", "magenta",
"yellow", "grey", "lightblue", "brown", "red", "lightgreen", "cyan" )
par(mfrow=c(4,3))
par(mar=c(1,1,5,1))
for(i in 1:nrow(top10)){
colorValue<-(col2rgb(as.character(colors_border[i]))%>% as.integer())/255
radarchart(rbind(rmax,rmin,top10[i,]),
axistype=2 ,
pcol=rgb(colorValue[1],colorValue[2],colorValue[3], alpha = 1),
pfcol=rgb(colorValue[1],colorValue[2],colorValue[3], alpha = 0.5),
plwd=1 , plty=1,cglcol="grey", cglty=1, axislabcol="grey", cglwd=0.5,vlcex=0.7,
title=rownames(top10[i,]))
}
title(paste0('\nShanghai World University Rankings top 10 (',nYear,')'),outer=TRUE,col.main='black',cex.main=1.5)
}
Press on the tab years to see the top 10 universities for each year.
###2005
top10SpiderWebYear(2005)
###2006
top10SpiderWebYear(2006)
###2007
top10SpiderWebYear(2007)
###2008
top10SpiderWebYear(2008)
###2009
top10SpiderWebYear(2009)
###2010
top10SpiderWebYear(2010)
###2011
top10SpiderWebYear(2011)
###2012
top10SpiderWebYear(2012)
###2013
top10SpiderWebYear(2013)
###2014
top10SpiderWebYear(2014)
###2015
top10SpiderWebYear(2015)
Let’s visualize first the top 10 (based on number of universities) - looking to few of the metrics available for AWUR.
For aggregation based on country, we will need to merge Shanghai ranking data with School-Country data.
merge(shanghaiDataCld,schoolCountry, by.x="university_name", by.y="school_name") -> scData
scData %>% group_by(country) %>% summarise(n = length(alumni)) %>% top_n(10,n) %>% ungroup() -> cs
scData %>% filter(country %in% cs$country) %>%
ggplot(aes(x=country, y=alumni, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Alumni with Nobel (score)",
title="Alumni with Nobel (score)", subtitle="Grouped by country") -> d1
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
scData %>% filter(country %in% cs$country) %>%
ggplot(aes(x=country, y=award, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Nobel awards score",
title="Nobel awards score", subtitle="Grouped by country") -> d2
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
scData %>% filter(country %in% cs$country) %>%
ggplot(aes(x=country, y=hici, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Highly cited score",
title="Highly cited score", subtitle="Grouped by country") -> d3
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
scData %>% filter(country %in% cs$country) %>%
ggplot(aes(x=country, y=ns, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Nature & Science publications score",
title="Nature & Science publications score", subtitle="Grouped by country") -> d4
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
scData %>% filter(country %in% cs$country) %>%
ggplot(aes(x=country, y=pub, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Publications score",
title="Publications score", subtitle="Grouped by country") -> d5
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
scData %>% filter(country %in% cs$country) %>%
ggplot(aes(x=country, y=pcp, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Per capita performance score",
title="Per capita performance score", subtitle="Grouped by country") -> d6
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
grid.arrange(d1,d2,d3, d4, d5, d6, ncol=2)
We can notice the large outliers for Alumni with Nobel, Nobels awards, Highly cited score, Nature & Science publication score and Per capita performance store for Universities in US, UK, Japan, Italy. Few of the top universities in those countries are scientific research powehouses and dominate the tops. With publication score there is a more balanced distribution in the top, with averages quite alined for most of the countries in top 10, and with a larger variance for the scores. Netherlands and China have the largest average values while US and Australia the largest variances. Let’s represent the World countries with the number of universities in the top for all countries.
scData %>% group_by(country, year) %>%
summarise(nr = length(total_score), minw=min(total_score), maxw=max(total_score), avgw=round(mean(total_score),0)) %>%
select(country, year, nr, minw, maxw, avgw) %>% ungroup() -> swur
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
# light grey boundaries
l <- list(color = toRGB("grey"), width = 0.5)
swur$hover <- with(swur,
paste("Country: ", country, '<br>',
"Year: ",year, "<br>",
"Universities: ", nr, "<br>",
"Min total score: ", minw, "<br>",
"Max total score: ", maxw, "<br>",
"Mean total score: ", avgw,"<br>"
))
# specify map projection/options
g <- list(
showframe = TRUE,
showcoastlines = TRUE,
projection = list(type = 'Mercator')
)
plot_geo(swur, locationmode = 'country names') %>%
add_trace(
z = ~nr, color = ~nr, colors = 'Spectral', frame = ~year,
text = ~hover, locations=~country, marker = list(line = l)
) %>%
colorbar(title = 'Number of\nuniversities', tickprefix = '') %>%
layout(
title = with(swur, paste('Number of universities<br>Source:<a href="http://www.shanghairanking.com">Shanghai Academic World University Rankings</a>')),
geo = g
)
The Times Higher Education data has information from
2011 to 2011. There is a World Rank (smaller values are better) and
coeficients (between 0 and 100) for:
The weight of the above factors in calculating the total score are as following:
The folowwing graph is showing the elements that are included in the calculation of the total score for Times Higher Education.
Formula for calculation of total score (Times Higher Education)
The calculation formula for total_score is:
total_score = 0.3 * teaching + 0.075 * international + 0.3 * research + 0.3 * citations + 0.025 * income
Additionally, in the THE dataset there is information about:
##Calculating missing values for Total Score
We first replace the special character used for missing data with
NA. Then we convert factors to numeric values. Then we
replace NA values with 0. We calculate the total score using the formula
given above. We are replacing the non available values for total score
with the calculated ones.
#replace first the missing values (`-`) with NA
timesData$teaching[timesData$teaching=='-'] <- NA
timesData$international[timesData$international=='-'] <- NA
timesData$research[timesData$research=='-'] <- NA
timesData$citations[timesData$citations=='-'] <- NA
timesData$income[timesData$income=='-'] <- NA
timesData$total_score[timesData$total_score=='-'] <- NA
#replace factors with numeric
timesData$teaching <- as.numeric(as.character(timesData$teaching))
timesData$international <- as.numeric(as.character(timesData$international))
timesData$research <- as.numeric(as.character(timesData$research))
timesData$citations <- as.numeric(as.character(timesData$citations))
timesData$income <- as.numeric(as.character(timesData$income))
timesData$total_score <- as.numeric(as.character(timesData$total_score))
# replace NAs with 0
timesData$income[is.na(timesData$income)] <- 0
timesData$international[is.na(timesData$international)] <- 0
#calculate the total score
timesData$t_score =
0.3 * as.numeric(as.character(timesData$teaching)) +
0.075 * as.numeric(as.character(timesData$international)) +
0.3 * as.numeric(as.character(timesData$research)) +
0.3 * as.numeric(as.character(timesData$citations)) +
0.025 * as.numeric(as.character(timesData$income))
#replace the total_score where missing with the calculated value
timesData$total_score[is.na(timesData$total_score)] <- timesData$t_score[is.na(timesData$total_score)]
##Top universities {.tabset .tabset-fade .tabset-pills}
Let’s represent top 10 for each year.
timesData$wr = as.numeric(as.character(timesData$world_rank))
## Warning: NAs introduced by coercion
thePlotYear <- function(nYear) {
timesData %>% filter(year==nYear) %>% top_n(10,-wr) %>%
ggplot(aes(x=reorder(university_name,-wr), y=wr)) + geom_bar(stat="identity", aes(fill=reorder(university_name,-wr)), colour="black") +
theme_bw() + coord_flip() + scale_fill_manual(values=c(rep("lightgreen",7), "#CD7F32", "grey", "gold")) + guides(fill=FALSE) +
labs(x="University name", y="World Rank",
title=paste("Rank in ",nYear), subtitle="(smaller value is better)")
}
thePlotYear(2011) -> d1
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
thePlotYear(2012) -> d2
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
thePlotYear(2013) -> d3
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
thePlotYear(2014) -> d4
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
thePlotYear(2015) -> d5
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
thePlotYear(2016) -> d6
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
grid.arrange(d1,d2,d3,d4,d5,d6, ncol=2)
We can observe as a significant difference from CWUR that
THE is placing the 3 prestigious universities from UK
(Cambridge, Oxford and Imperial College London) in top-10 (while only
Cambridge and Oxford are in top-10 by CWUR). Both Cambridge and Oxford
are in top 5 for all or a fraction of the years. Harvard University is
on the first place in 1 out of 6 years (2011), Caltech (California
Institute of Technology) being on first place for the consecutive years
2012-2016.
Harvard, Oxford, Stanford are sharing 2nd and 3rd place (in pairs) in
2012, 2013 and 2014.
Panoramic view of Oxford University (Wikimedia Commons)
Let’s represent the top universities on the same graph with several significant dimmensions. We will use a radiochart (Spider-Web) type of graph, as previously for AWUR (Shanghai) data.
timesData %>% group_by(year) %>%
top_n(10, wt = total_score) %>%
select(year,university_name,total_score,teaching, international, research, citations, income) %>% ungroup() -> top10univ
theTop10SpiderWebYear <- function(nYear) {
top10univ %>% filter(year==nYear) %>% ungroup() -> top10u
top10 <- as.data.frame(cbind(top10u[,c(3,4,5,6,7,8)]))
colnames(top10) <- c("Total Score", "Teaching", "International Outlook", "Research",
"Citations","Industry Income")
rownames(top10) <- top10u$university_name
rmin <- apply(top10,2,min); rmax <- apply(top10,2,max)
rmax <- 100
rmin <- 0
colors_border=c( "tomato", "blue", "gold", "green", "magenta",
"yellow", "grey", "lightblue", "brown", "red", "lightgreen", "cyan" )
par(mfrow=c(4,3))
par(mar=c(1,1,5,1))
for(i in 1:nrow(top10)){
colorValue<-(col2rgb(as.character(colors_border[i]))%>% as.integer())/255
radarchart(rbind(rmax,rmin,top10[i,]),
axistype=2 ,
pcol=rgb(colorValue[1],colorValue[2],colorValue[3], alpha = 1),
pfcol=rgb(colorValue[1],colorValue[2],colorValue[3], alpha = 0.5),
plwd=1 , plty=1,cglcol="grey", cglty=1, axislabcol="grey", cglwd=0.5,vlcex=0.7,
title=rownames(top10[i,]))
}
title(paste0('\nTimes Higher Education World University Rankings top 10 (',nYear,')'),outer=TRUE,col.main='black',cex.main=1.5)
}
Press on the tab years to see the top 10 universities for each year.
###2011
theTop10SpiderWebYear(2011)
###2012
theTop10SpiderWebYear(2012)
###2013
theTop10SpiderWebYear(2013)
###2014
theTop10SpiderWebYear(2014)
###2015
theTop10SpiderWebYear(2015)
###2016
theTop10SpiderWebYear(2016)
##All countries rankings {.tabset .tabset-fade .tabset-pills}
We will look to the aggregated values per country (top 10 countries)
for the same factors studied also for the top 10 universities,
i.e. Teaching, International Outlook, Research, Citations, Industry
Income and Overall (total_score).
Where the total score, international and income values are not set, we
will consider them to be 0.
Where total score was not available and the other values are available,
the total score was calculated using the formula described in the
introduction.
timesData %>% group_by(country) %>% summarise(n = length(teaching)) %>% top_n(10,n) %>% ungroup() -> ct
timesData %>% filter(country %in% ct$country) %>%
ggplot(aes(x=country, y=teaching, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Teaching score",
title="Teaching score", subtitle="Grouped by country") -> d1
timesData %>% filter(country %in% ct$country) %>%
ggplot(aes(x=country, y=international, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="International outlook score",
title="International outlook score", subtitle="Grouped by country") -> d2
timesData %>% filter(country %in% ct$country) %>%
ggplot(aes(x=country, y=research, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Research score",
title="Research score", subtitle="Grouped by country") -> d3
timesData %>% filter(country %in% ct$country) %>%
ggplot(aes(x=country, y=citations, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Citations score",
title="Citations score", subtitle="Grouped by country") -> d4
timesData %>% filter(country %in% ct$country) %>%
ggplot(aes(x=country, y=income, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Industry income score",
title="Industry income score", subtitle="Grouped by country") -> d5
timesData %>% filter(country %in% ct$country) %>%
ggplot(aes(x=country, y=total_score, col=country)) + guides(col=FALSE) +
geom_boxplot() + theme_bw() + coord_flip() +
labs(x="Country", y="Total score",
title="Total score", subtitle="Grouped by country") -> d6
grid.arrange(d1,d2,d3,d4, d5, d6, ncol=2)
Note: the total score is highly affected by missing values. Industry income (income) and International Outlook (international) are as well partially affected.
Let’s represent now the number of universities per countries on a World map. We will use as value for aggregation the total score.
timesData$total_score = as.numeric(as.character(timesData$total_score))
#replace with 0 the missing total_score values - this will affect the aggregated values
timesData %>% group_by(country,year) %>%
summarise(nr = length(total_score), minw=min(total_score), maxw=max(total_score), avgw=round(mean(total_score),0)) %>%
select(country, year, nr, minw, maxw, avgw) %>% ungroup() -> ther
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
# light grey boundaries
l <- list(color = toRGB("grey"), width = 0.5)
ther$hover <- with(ther,
paste("Country: ", country, '<br>',
"Year: ",year, "<br>",
"Universities: ", nr, "<br>",
"Min total score: ", minw, "<br>",
"Max total score: ", maxw, "<br>",
"Mean total score: ", avgw,"<br>"
))
# specify map projection/options
g <- list(
showframe = TRUE,
showcoastlines = TRUE,
projection = list(type = 'Mercator')
)
plot_geo(ther, locationmode = 'country names') %>%
add_trace(
z = ~nr, color = ~nr, colors = 'Spectral', frame = ~year,
text = ~hover, locations=~country, marker = list(line = l)
) %>%
colorbar(title = 'Number of\nuniversities', tickprefix = '') %>%
layout(
title = with(ther, paste('Number of universities<br>Source:<a href="https://www.timeshighereducation.com/world-university-rankings">Times Higher Education World University Ranking</a>')),
geo = g
)
As we already explained, THE contains additional metrics (Number of FTE Students, Number of Students per Staff, International Students, Female-male ratio). The first three variables can be used directly, for female-male ratio we will need to process the data.
timesData$female_ratio <-
sapply(strsplit(as.character(timesData$female_male_ratio), ":"), "[", 1)
Let’s represent these metrics for the top 10 universities in the world.
timesData %>% filter(university_name %in% top10univ$university_name) %>%
ggplot(aes(x=university_name, y=as.numeric(as.character(gsub(",","",num_students)))),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="darkgreen", colour="black", position="dodge")+ theme_bw() + coord_flip() +
labs(x="University", y="FTE Students",
title="FTE Students", subtitle="Top 10 universities") -> d1
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
timesData %>% filter(university_name %in% top10univ$university_name) %>%
ggplot(aes(x=university_name, y=as.numeric(as.character(gsub("%","",international_students)))),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="lightblue", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="University", y="International Students percentage",
title="International Students percentage [%]", subtitle="Top 10 universities") -> d2
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
timesData %>% filter(university_name %in% top10univ$university_name) %>%
ggplot(aes(x=university_name, y=as.numeric(as.character(female_ratio))),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="pink", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="University", y="Female percentage",
title="Female percentage [%]", subtitle="Top 10 universities") -> d3
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
timesData %>% filter(university_name %in% top10univ$university_name) %>%
ggplot(aes(x=university_name, y=as.numeric(as.character(student_staff_ratio))),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="lightgreen", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="University", y="Student staff ratio",
title="Student staff ratio (number students per one staff)", subtitle="Top 10 universities") -> d4
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
grid.arrange(d1,d2,d3,d4,ncol=1)
## Warning: Removed 6 rows containing missing values (geom_bar).
We can see that the Caltech has the smaller number of students, UCLA the largest, Yale University has the largest percent of female students (50%), ETH Zurich the smallesr (but above 30%), Yale has the smallest Student-staff ratio (less than 5) while University of California at Berkeley the largest (~16), the percentage of foreign students is largest at University College London (above 50%) and it is the smalles at University of California at Berkeley (~15%).
#Romanian Universities in World Rankings
I am a Romanian so I would like to see how Romanian Universities are placed in the World Rankings. Unfortunatelly, just a few universities are present, they are very low in the hierarchy, and this only in recent years.
Let’s look first to the Romanian universities present in CWUR hierarchy.
cwurData %>% filter(country=="Romania") %>%
knitr::kable(caption="Center of World University Ranking information: Romanian Universities presence")
| world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | year |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 957 | University of Bucharest | Romania | 1 | 321 | 478 | 210 | 837 | 776 | 609 | 937 | 737 | 44.32 | 2014 |
| 986 | University of Bucharest | Romania | 1 | 237 | 567 | 218 | 845 | 804 | 645 | 975 | 871 | 44.04 | 2015 |
| 987 | Babeș-Bolyai University | Romania | 2 | 367 | 567 | 218 | 874 | 855 | 812 | 958 | 871 | 44.04 | 2015 |
There are just two of them, University of Bucharest
(appearing in 2014 and 2015) and Babeș-Bolyai University
(in 2015). The score is around 44, on a scale from 0 to 100 (Harvard
University has usually 100). Shanghai (AWUR) does not contain any
Romanian university. Let’s look now to the Romanian universities present
in THE hierarchy.
timesData %>% filter(country=="Romania") %>%
knitr::kable(caption="Times New Education World University Ranking information: Romanian Universities presence")
| world_rank | university_name | country | teaching | international | research | citations | income | total_score | num_students | student_staff_ratio | international_students | female_male_ratio | year | t_score | wr | female_ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 501-600 | Babeş-Bolyai University | Romania | 27.9 | 35.4 | 12.5 | 32.1 | 28.6 | 25.1200 | 37,915 | 23.5 | 2% | 67 : 33 | 2016 | 25.1200 | NA | 67 |
| 601-800 | Alexandru Ioan Cuza University | Romania | 24.9 | 46.9 | 13.6 | 7.0 | 28.2 | 17.8725 | 25,724 | 33.0 | 5% | 63 : 37 | 2016 | 17.8725 | NA | 63 |
| 601-800 | University of Bucharest | Romania | 34.3 | 21.3 | 11.5 | 9.9 | 29.5 | 19.0450 | 31,806 | 24.9 | 2% | 70 : 30 | 2016 | 19.0450 | NA | 70 |
| 601-800 | West University of Timişoara | Romania | 16.1 | 21.0 | 3.9 | 22.4 | 0.0 | 14.2950 | 12,933 | 19.0 | 3% | 62 : 38 | 2016 | 14.2950 | NA | 62 |
Four universities made the hierarchy in 2016 (the first time). Let’s represent these data from THE about Romanian Universities against the same year top 5 universities data.
timesData %>% filter(country=="Romania") %>%
select(year,university_name,total_score,teaching, international, research, citations, income) -> theROData
timesData %>% group_by(year) %>%
top_n(5, wt = total_score) %>%
select(year,university_name,total_score,teaching, international, research, citations, income) %>% ungroup() -> top5univ
top5univ <- rbind(theROData,top5univ)
theTop5SpiderWebYear <- function(nYear) {
top5univ %>% filter(year==nYear) %>% ungroup() -> top5u
top5 <- as.data.frame(cbind(top5u[,c(3,4,5,6,7,8)]))
colnames(top5) <- c("Total Score", "Teaching", "International Outlook", "Research",
"Citations","Industry Income")
rownames(top5) <- top5u$university_name
rmin <- apply(top5,2,min); rmax <- apply(top5,2,max)
rmax <- 100
rmin <- 0
colors_border=c( "tomato", "blue", "gold", "green", "magenta",
"yellow", "grey", "lightblue", "brown", "red", "lightgreen", "cyan" )
par(mfrow=c(3,3))
par(mar=c(1,1,5,1))
for(i in 1:nrow(top5)){
colorValue<-(col2rgb(as.character(colors_border[i]))%>% as.integer())/255
radarchart(rbind(rmax,rmin,top5[i,]),
axistype=2 ,
pcol=rgb(colorValue[1],colorValue[2],colorValue[3], alpha = 1),
pfcol=rgb(colorValue[1],colorValue[2],colorValue[3], alpha = 0.5),
plwd=1 , plty=1,cglcol="grey", cglty=1, axislabcol="grey", cglwd=0.5,vlcex=0.7,
title=rownames(top5[i,]))
}
title(paste0('\nTHE World University Rankings top 5 vs Romanian Univ. (',nYear,')'),outer=TRUE,col.main='black',cex.main=1.5)
}
theTop5SpiderWebYear(2016)
For all variables, the performance is quite poor for the Romanian universities. They range a bit better on teaching and industry income but typically the performance is poor for research and generally poor for citations. The conclusion is simple. For these Romanian Universities best placed in 2016 in THE World University Rankings (ranking in 501-600 range and601-800 range) there is a lot to be improved (especially in research) in order to approach the performance of the top universities in the World.
Let’s check now the additional metrics for THE for the Romanian Universities vs. top 5 World Universities.
#merge top 5 World Universities with the 4 Romanian Universities
timesData %>% filter(country=="Romania") %>%
select(university_name,num_students, international_students,female_ratio, student_staff_ratio) -> theROData
timesData %>% filter(year==2016) %>% top_n(5, wt = total_score) %>%
select(university_name,num_students, international_students,female_ratio, student_staff_ratio) %>% ungroup() -> top5univ
top5univ <- rbind(theROData,top5univ)
timesData %>% filter(university_name %in% top5univ$university_name) %>%
ggplot(aes(x=university_name, y=as.numeric(as.character(gsub(",","",num_students)))),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="darkgreen", colour="black", position="dodge")+ theme_bw() + coord_flip() +
labs(x="University", y="FTE Students",
title="FTE Students", subtitle="Top 5 World Universities against top Romanian Universities") -> d1
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
timesData %>% filter(university_name %in% top5univ$university_name) %>%
ggplot(aes(x=university_name, y=as.numeric(as.character(gsub("%","",international_students)))),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="lightblue", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="University", y="International Students percentage",
title="International Students percentage [%]", subtitle="Top 5 World Universities against top Romanian Universities") -> d2
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
timesData %>% filter(university_name %in% top5univ$university_name) %>%
ggplot(aes(x=university_name, y=as.numeric(as.character(female_ratio))),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="pink", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="University", y="Female percentage",
title="Female percentage [%]", subtitle="Top 5 World Universities against top Romanian Universities") -> d3
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
timesData %>% filter(university_name %in% top5univ$university_name) %>%
ggplot(aes(x=university_name, y=as.numeric(as.character(student_staff_ratio))),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="lightgreen", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="University", y="Student staff ratio",
title="Student staff ratio (number students per one staff)", subtitle="Top 5 World Universities against top Romanian Universities") -> d4
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
grid.arrange(d1,d2,d3,d4,ncol=1)
University of Bucharest, Babes-Bolyai University (Cluj) and Alexandru Ioan Cuza University (Iasi) have all larger numbers of full-time students than the top 5 World Universities. Only Timisoara University (a much smaller town) has a smaller number of students, still, one of the largest compared with top 5 World Universities. The student-staff ratio is much higher than the top 5 for all Romanian universities (smaller than 20 for West University of Timisoara) with Alexandru Ioan Cuza (Iasi) topping at over 30 students per staff. The number of international students is small for all 4 Romanian Universities in THE Rankings, with values between 2 and 5%. Alexandru Ioan Cuza University (Iasi) has almost 5% international students. As per the female-male ratio, here we see something characteristic to former communist countries but also a sign of modernity: female-male ratio is higher than 60% for all universities (65% for University of Bucharest), for two reasons: historically, female-male ratio is high, due to egalitarian tradition in former communist countries; as well, female ratio tend to be higher in non-technical faculties and all these Universities does not include technical faculties (traditionally segregated in former communist countries to separate higher education institutions: here the female-male ratio is smaller).
timesData %>% filter(country %in% c("Romania","Bulgaria", "Hungary", "Serbia", "Slovakia", "Slovenia")) %>%
select(year,university_name,total_score,teaching, international, research, citations, income) -> theEEData
theEESpiderWebYear <- function(nYear) {
theEEData %>% filter(year==nYear) %>% ungroup() -> top5u
top5 <- as.data.frame(cbind(top5u[,c(3,4,5,6,7,8)]))
colnames(top5) <- c("Total Score", "Teaching", "International Outlook", "Research",
"Citations","Industry Income")
rownames(top5) <- top5u$university_name
rmin <- apply(top5,2,min); rmax <- apply(top5,2,max)
rmax <- 100
rmin <- 0
colors_border=c( "tomato", "blue", "gold", "green", "magenta",
"yellow", "grey", "lightblue", "brown", "red",
"lightgreen", "cyan","tomato", "blue", "gold" )
par(mfrow=c(5,3))
par(mar=c(1,1,5,1))
for(i in 1:nrow(top5)){
colorValue<-(col2rgb(as.character(colors_border[i]))%>% as.integer())/255
radarchart(rbind(rmax,rmin,top5[i,]),
axistype=2 ,
pcol=rgb(colorValue[1],colorValue[2],colorValue[3], alpha = 1),
pfcol=rgb(colorValue[1],colorValue[2],colorValue[3], alpha = 0.5),
plwd=1 , plty=1,cglcol="grey", cglty=1, axislabcol="grey", cglwd=0.5,vlcex=0.7,
title=rownames(top5[i,]))
}
title(paste0('\nTHE World University Rankings: Former Communist countries (',nYear,')'),outer=TRUE,col.main='black',cex.main=1.5)
}
theEESpiderWebYear(2016)
Let’s process the information about Education expenditure. There are
37 countries (or suprastatal entities, like OECD, for
reference) analyzed. As well, there are 3 different
institute types included:
All Institutions , Elementary and Secondary Institutions , Higher Education Institutions.
As for the direct expenditure types, there are 3 diferent
values: Public, Private, Total. The data is for 6 different
years, 1995, 2000, 2005,
2009, 2010 and 2011.
Let’s reshape the data, to replace the columns for years with rows.
cols <- c("country","institute_type", "direct_expenditure_type")
#use melt to flatten the expenditure ration/years
educationExpenditureFlatten <- melt(educationExpenditure, id=cols)
colnames(educationExpenditureFlatten) <- c(cols,"year","ratio")
#extract the year number
educationExpenditureFlatten$year <- as.integer(substring(educationExpenditureFlatten$year, 2))
We prepare next the function for plotting the education expenditure
for all the countries in the World (for which we do have the data).
Before presenting the institute_type and
direct_expenditure_type to the function, we convert the
factor in characters and trim the white spaces (so that we can use
easier the filters).
educationExpenditureFlatten$institute_type = trimws(as.character(educationExpenditureFlatten$institute_type))
educationExpenditureFlatten$direct_expenditure_type = trimws(as.character(educationExpenditureFlatten$direct_expenditure_type))
educationExpenditureRatioYear <- function(nType, nDirectExpenditureType) {
educationExpenditureFlatten %>% filter(
institute_type==nType, direct_expenditure_type == nDirectExpenditureType) -> eef
# light grey boundaries
l <- list(color = toRGB("grey"), width = 0.5)
eef$hover <- with(eef,
paste("Country: ", country, '<br>',
"Year: ",year, "<br>",
"Ratio: ", ratio
))
# specify map projection/options
g <- list(showframe = TRUE, showcoastlines = TRUE, projection = list(type = 'Mercator'))
plot_geo(eef, locationmode = 'country names') %>%
add_trace(z = ~ratio, color = ~ratio, colors = 'Spectral', frame = ~year,
text = ~hover, locations=~country, marker = list(line = l)) %>%
colorbar(title = 'Ratio', tickprefix = '%') %>%
layout(
title = with(eef, paste('Education expenditure ratio;<br>Institution type:<b>',nType,'</b>; Expenditure type:<b>',nDirectExpenditureType,'</b>)')),
geo = g
)
}
##All Institutions {.tabset .tabset-fade .tabset-pills}
Click on Public, Private and Total expenditure tabs to see the data for each type of expenditure.
###Public expenditure {.tabset .tabset-fade .tabset-pills}
Let’s show now the Public expenditures for All institutions (World map for all the years).
The public expenditures ratio for all institutions are highest in 1995 in Norway (6.8%), Sweden and Finland (6.6%), Denmark (6.4%), followed by France & Canada (5.8%), Portugal (5.4%) and New Zeeland or Austria (5.3%). US has only 5%, being below Poland (5.2%). Norway, Sweden, Denmark, Finland keep their leading position from 1995 to 2011, with the top values for 2011 being for Denmark (7.5%) and Norway (7.3%). US, Germany, UK have values in those years between 4.4% and 5.6%. In 2011 US is behind countries like Brazil (5.9%) or even Mexic (5%).
educationExpenditureRatioYear("All Institutions", "Public")
###Private expenditure {.tabset .tabset-fade .tabset-pills}
Let’s inspect now the Private expenditures for All institutions (World map for all the years). There are no data about private expenditure for education for the years 1995 to 2010. We only have data for 2011. For this year, South Korea is leading with 2.8% ratio, followed by Chile (2.5%), United States (2.2%), Japan (1.6%), Australia (1.5%), New Zeeland (1.2%) and Mexic (1.1%).
educationExpenditureRatioYear("All Institutions", "Private")
###Total expenditure {.tabset .tabset-fade .tabset-pills}
Let’s inspect now the Total expenditures for All institutions (World map for all the years). The Total expenditure for All institutions data is available only for 2011. In 2011 the countries with largest Total expenditure for All institutions are Denmark (7.9%), Iceland (7.7%), South Korea (7.6%), New Zeeland (7.5%), United States (6.9%) and Finland (6.5%).
educationExpenditureRatioYear("All Institutions", "Total")
##Elementary and Secondary Institutions {.tabset .tabset-fade .tabset-pills}
Click on Public, Private and Total expenditure tabs to see the data for each type of expenditure.
###Public expenditure {.tabset .tabset-fade .tabset-pills}
Let’s visualize now the Public expenditures for Elementary and
Secondary Institutions (World map for all the years).
North European countries are in the top starting with 1995. Sweden
(4.4%), Finland (4.2%), Norway (4.1%) are leading the World and European
countries. Follows France (4.1%), Switzerland (4.1%) and Canada (4.0%).
UK has only 3.8% and US 3.5%. In the following years the top keeps the
Nordics in the top but we see New Zeeland on first place with 4.6%
(2000), Iceland with 5.2% (2005, 2009). In te following years Norway is
leading with 5.1% (2010) and 4.9% (2011).
educationExpenditureRatioYear("Elementary and Secondary Institutions", "Public")
###Private expenditure {.tabset .tabset-fade .tabset-pills}
Let’s see now the Privates expenditures for Elementary and Secondary Institutions (World map for all the years).
For the years from 1995 to 2010, there are no data for Elementary and Secondary Institutions (private expenditures). For the year 2011, the first place belongs to South Korea (0.8%). South Korea is followed by Chile (0.7%), Australia, New Zeeland and Mexic (0.6%). In Europe, Switzerland has the best ratio (0.5%).
educationExpenditureRatioYear("Elementary and Secondary Institutions", "Private")
###Total expenditure {.tabset .tabset-fade .tabset-pills}
Let’s see now the Total expenditures for Elementary and Secondary Institutions (World map for all the years).
The Total expenditure for Elementary and Secondary Institutions is available only for 2011. In this year the top countries are New Zeeland (5.4%), Iceland (4.9%), United Kingdom (4.7%), Ireland (4.6%), Denmark and Belgium (both with 4.4%).
educationExpenditureRatioYear("Elementary and Secondary Institutions", "Total")
##Higher Education Institutions {.tabset .tabset-fade .tabset-pills}
Click on Public, Private and Total expenditure tabs to see the data for each type of expenditure.
###Public expenditure {.tabset .tabset-fade .tabset-pills}
Let’s visualize now the Public expenditures for Higher Education Institutions (World map for all the years).
For Higher Education Institution, starting with 1995, the Public expenditures ratios were maximum in Finland (1.7%), followed by Sweden (1.6%), Norway and Canada (1.5%), Denmark (1.3%). United States was on a lower place, with only 1.1%, preceeded by Australia, with 1.2%. Through the years from 1995 to 2011, US kept his ratio between 0.9% and 1.1%. Finland kept his leading position, increasing the ratio to 1.9% for 2010 and 2011. Sweden, Norway and Denmark kept as well their leading position, with values above 1.5%. Greece has an interesting apparition in 2005 with a ratio of 1.4%, one of the largest. Israel has as well a good position, with values of ratio in the same range as US. Between European countries with good academic performance, France has a constant good ratio around and larger than 1%, with a record value of 1.3% in 2009.
educationExpenditureRatioYear("Higher Education Institutions", "Public")
###Private expenditure {.tabset .tabset-fade .tabset-pills}
Let’s see now the Privates expenditures for Higher Education Institutions (World map for all the years).
There are no Higher Education Institution Private education expenditure ratio information for the years 1995 to 2010. In 2011, the top is dominated by South Korea (1.9%), United States (1.8%), Chile (1.7%), Japan (1%), Australia (0.9%), Israel (0.9%), Netherlands and Russia (0.5%).
educationExpenditureRatioYear("Higher Education Institutions", "Private")
###Total expenditure {.tabset .tabset-fade .tabset-pills}
Let’s see now the Total expenditures for Higher Education Institutions (World map for all the years). The Total expenditure for Higher Education Institutions data is only available for 2011. In this year United States tops the countries (2.7%), followed by South Korea (2.6%), Chile (2.4%), Finland (1.9%), Denmark (1.9%), Nederland (1.8%). France and Ireland have both 1.5% while United Kingdom 1.2%.
educationExpenditureRatioYear("Higher Education Institutions", "Total")
#Is expenditure alone explaining performance?
Let’s plot the top 10 countries in parallel with top expenditure (as a percent from GDP); what we try to understand is if percent allocated for education is enough to explain performance in education.
cwurData %>% filter(year==2012) %>% group_by(country) %>%
summarise(nr = length(world_rank)) %>% top_n(10,nr) %>%
ggplot(aes(x=reorder(country,nr), y=nr),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="lightgreen", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="Country", y="Number of universities in top",
title="Number of universities (2012)", subtitle="Top 10 countries - World University Rankings (CWUR)") -> d1
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
educationExpenditureFlatten %>% filter(year==2011,institute_type=="Higher Education Institutions", direct_expenditure_type =="Total") %>% top_n(10,ratio) %>%
ggplot(aes(x=reorder(country,ratio), y=ratio),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="gold", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="Country", y="Total expenditure percent",
title="Total education expenditure (2011)", subtitle="Top 10 countries - Higher Education Institutions") -> d2
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
grid.arrange(d1,d2,ncol=2)
Let’s merge Education expenditure per country data with UN Data showing the GDP for the countries. This will help us normalize the expenditure per country, since the ratio that we were using until now aplies to very different GDP.
Here we load the country profiles from UN Data.
undata <- read.csv("data/country_data/country_profile_variables.csv")
Let’s visualize the total education expenditure in 2011, for the top 10 countries, only the Higher Education Institutions.
undata$country <- as.character(undata$country)
undata$country[undata$country=="United States of America"] <- "United States"
education_expenditure_total <- merge(undata,educationExpenditureFlatten, by="country")
education_expenditure_total$eet = education_expenditure_total$GDP..Gross.domestic.product..million.current.US.. * education_expenditure_total$ratio / 100 / 1000
cwurData %>% filter(year==2012) %>% group_by(country) %>%
summarise(nr = length(world_rank)) %>% top_n(10,nr) %>%
ggplot(aes(x=reorder(country,nr), y=nr),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="lightgreen", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="Country", y="Number of universities in top",
title="Number of universities (2012)", subtitle="Top 10 countries - World University Rankings (CWUR)") -> d1
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
education_expenditure_total %>% filter(year==2011,institute_type=="Higher Education Institutions", direct_expenditure_type =="Total") %>% top_n(10,eet) %>%
ggplot(aes(x=reorder(country,eet), y=eet),fill=year) + guides(fill=FALSE) +
geom_bar(stat="identity", fill="gold", colour="black", position=position_dodge(0.2))+ theme_bw() + coord_flip() +
labs(x="Country", y="Total expenditure (Billions USD)",
title="Total education expenditure (2011)", subtitle="Top 10 countries - Higher Education Institutions") -> d2
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
grid.arrange(d1,d2,ncol=2)
The result is now much closed to something we expected: Total Higher Education expenditure top includes the now allmost the same countries as the Number of universities in CWUR per country. USA, Japan, United Kingdom, France are all in top 5 in both tops.
The countries that are in top 10 for number of universities in the CWUR and not in top of country Total Higher Education expenditure are Israel, Swizerland and Canada. The countries that are in the Total Higher Education expenditure top but not in the Number of universities in CWUR top 10 are Italy, Spain and Mexico. Israel, United Kingdom, Switzerland, Canada shows a very good efficiency of investment in Higher Education in terms of performance of universities.
cwurData %>%
filter(year==2012) %>%
group_by(country) %>%
summarise(nr = length(world_rank)) -> univ_per_country
univ_per_country$country <- as.character(univ_per_country$country)
univ_per_country$country[univ_per_country$country=="USA"] <- 'United States'
education_expenditure_total %>% filter(year==2011, institute_type=="Higher Education Institutions", direct_expenditure_type =="Total") -> education_expenditure_higher_total
education_expenditure_higher_total = merge(education_expenditure_higher_total,univ_per_country, by="country", all.x=TRUE)
education_features = c("country", "year", "nr","eet",
"Education..Government.expenditure....of.GDP.",
"Surface.area..km2.",
"Population.in.thousands..2017.",
"GDP.per.capita..current.US.." )
education_expenditure_higher_total_f = education_expenditure_higher_total[,education_features]
education_expenditure_higher_total_f <-
education_expenditure_higher_total_f[complete.cases(education_expenditure_higher_total_f), ]
education_expenditure_higher_total_f$Education..Government.expenditure....of.GDP. = as.numeric(education_expenditure_higher_total_f$Education..Government.expenditure....of.GDP.)
education_expenditure_higher_total_f$Surface.area..km2. = as.numeric(education_expenditure_higher_total_f$Surface.area..km2.)
education_expenditure_higher_total_f$Population.in.thousands..2017. = as.numeric(education_expenditure_higher_total_f$Population.in.thousands..2017.)
education_expenditure_higher_total_f$GDP.per.capita..current.US.. = as.numeric(education_expenditure_higher_total_f$GDP.per.capita..current.US..)
names(education_expenditure_higher_total_f) <- c(
"country",
"year",
"Number of universities in top",
"Higher education total expenses",
"Education Gov. expend. % GDP",
"Surface (km2)",
"Population (thousands)",
"GDP per capita (USD)"
)
eeht <- education_expenditure_higher_total_f[,3:8]
correlations <- cor(eeht,method="pearson")
corrplot(correlations, number.cex = .9, tl.cex=0.7, tl.col = "black")
There is a high positive correlation between the number of universities in top per country and the Higher Education total expenses. As well, he population is highly correlated with both number of universities in top and the Higher Educaiton total expenses. Smaller but still positive correlation is beween the GDP per capita and the Education Government expenses percent from GDP.
Let’s represent the number of universities in top per country (in 2012) as a function of Higher Education total expenses (in 2011) per country (we already did see that these values are highly correlated).
education_expenditure_higher_total_f %>%
ggplot(aes(x=`Higher education total expenses`, y=`Number of universities in top`)) + guides(fill=FALSE) +
geom_point() + geom_smooth(method=lm) + theme_bw() +
geom_label_repel(aes(label = country),box.padding=0.15, point.padding=0.25,segment.color = 'blue') +
labs(x="Higher Education total expenses - 2011 [billions USD]", y="Number of universities in top (2012)",
title="Number of universities in top (2012)", subtitle="World University Rankings (CWUR)")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## `geom_smooth()` using formula 'y ~ x'
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
Also from this representation we can notice that there is a good correlation between Higher Education total expenses and the number of universitie in top. In terms of efficiency, United Kingdom, Israel, Norway have a higher efficiency, with a number of universities in the World Rankings higher / Higher education total expenses compared with Japan or Italy.
Let’s represent the same graph, without US.
education_expenditure_higher_total_f %>% filter(country != "United States") %>%
ggplot(aes(x=`Higher education total expenses`, y=`Number of universities in top`)) + guides(fill=FALSE) +
geom_point() + geom_smooth(method=lm) + theme_bw() +
geom_label_repel(aes(label = country),box.padding=0.15, point.padding=0.25,segment.color = 'blue') +
labs(x="Higher Education total expenses - 2011 [billions USD]", y="Number of universities in top (2012)",
title="Number of universities in top (2012)", subtitle="World University Rankings (CWUR)")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## `geom_smooth()` using formula 'y ~ x'
We represented the Center for World University Rankings data (CWUR)
for years 2012-2015, Shanghai - Academic World University Rankings
(AWUR) data for 2005-2015 and Times Higher Education (THE) World
University Rankings (2011-2016). For all three datasets, the top of
Universities (per year - based on World Ranking and Research and
Academic performance related metrics), the country aggregated values
(per year) are represented. Harvard University (followed by few other US
Universities and UK Universities) are dominating in CWUR
and AWUR rankings. In `THE ranking, UK and US universities
are sharing more evenly the top 10.
As for countries, in early years US, UK, Canada, France, Spain, Germany, Australia, Japan are in top, in recent years (qfter 2010) China takes as well a top position. Public expenditure is higher in the Nordic European countries for All Institutions and as well for Higher Education only whilst private spending for Higher Education is highest in United States and South Korea. Public and private expenditures, especially for Higher Education are not explaining alone the high performance of certain Universities. While in the Nordics, South Korea, New Zeeland the high expenditures are not resulting in a consistent presence of the Universities in the World top, the high percents in United States are consistent with the presence of the top US Universities in the World Rankings tops.
Adding data from UNData dataset, to extract the GDP (Gross Domestic Product) allowed us to compare actuall total expenditure (by multiplying the GDP with ratio) and we were able to see a good correlation between top 5 total expenditure for Higher Education / country and the number of universities / country.
This Kernel is still under development. I would highly appreciate your feedback for improvement and, of course, if you like it, please upvote it!
# References
[1] Center for World University Rankings (CWUR), http://cwur.org/
[2] Shanghai Rankings - - Academic Rankings for World Universities
(ARWU), http://www.shanghairanking.com/
[3] Shanghai Rankings - Academic Rankings for World Universities (ARWU)
Methodology, http://www.shanghairanking.com/ARWU-Methodology-2015.html
[4] Times Higher Education (THE) University Rankings, https://www.timeshighereducation.com/world-university-rankings
[5] Public Spending on Education, OECD Data, https://data.oecd.org/eduresource/public-spending-on-education.htm
[6] Private Spendig on Education, OECD Data, https://data.oecd.org/eduresource/private-spending-on-education.htm